Identification of a viral gene essential for the genome replication of a domesticated endogenous virus in ichneumonid parasitoid wasps

Thousands of endoparasitoid wasp species in the families Braconidae and Ichneumonidae harbor "domesticated endogenous viruses" (DEVs) in their genomes. This study focuses on ichneumonid DEVs, named ichnoviruses (IVs). Large quantities of DNA-containing IV virions are produced in ovary calyx cells during the pupal and adult stages of female wasps. Females parasitize host insects by injecting eggs and virions into the body cavity. After injection, virions rapidly infect host cells which is followed by expression of IV genes that promote the successful development of wasp offspring. IV genomes consist of two components: proviral segment loci that serve as templates for circular dsDNAs that are packaged into capsids, and genes from an ancestral virus that produce virions. In this study, we generated a chromosome-scale genome assembly for Hyposoter didymator that harbors H. didymator ichnovirus (HdIV). We identified a total of 67 HdIV loci that are amplified in calyx cells during the wasp pupal stage. We then focused on an HdIV gene, U16, which is transcribed in calyx cells during the initial stages of replication. Sequence analysis indicated that U16 contains a conserved domain in primases from select other viruses. Knockdown of U16 by RNA interference inhibited virion morphogenesis in calyx cells. Genome-wide analysis indicated U16 knockdown also inhibited amplification of HdIV loci in calyx cells. Altogether, our results identified several previously unknown HdIV loci, demonstrated that all HdIV loci are amplified in calyx cells during the pupal stage, and showed that U16 is required for amplification and virion morphogenesis.


Introduction
Endogenous viral elements (EVEs) refer to viral sequences in eukaryotic genomes that originate from complete or partial integration of a viral genome into the germline [1].While retroviruses are the best-known sources of EVEs, bioinformatic studies have also identified nonretroviral EVEs across a diverse range of organisms [2].Although many EVEs become nonfunctional and decay through neutral evolution [3], some have been preserved and repurposed by their hosts for new functions, often as short regulatory sequences or individual genes [4,5].A notable exception to this pattern is observed in domesticated endogenous viruses (DEVs) that have been identified in four lineages of endoparasitoid wasps-insects that lay eggs and develop within the bodies of other insects [6].Parasitoid DEVs consist of numerous genes conserved within the wasp genome that originate from the integration of complete viral genomes.Unlike other EVEs, these genes remain functional and actively interact to produce viral particles in calyx cells, which are located in the apical part of the oviducts of female wasps [7].Viral particles are produced in the pupal and adult stages and accumulate in the oviducts of the wasp.Adult female wasps inject these particles along with eggs into insect hosts where they have essential functions in the successful development of wasp offspring [8].
DEVs are most prevalent among species in two wasp families named the Braconidae and Ichneumonidae.The DEVs identified in these two wasp families have evolved from different viral ancestors but through convergence have been similarly repurposed to produce either virions containing circular double-stranded (ds) DNAs or virus-like particles (VLPs) lacking nucleic acid.The hyperdiverse microgastroid complex in the family Braconidae harbors DEVs named bracoviruses (BVs) that evolved from a common virus ancestor in the family Nudiviridae [9].Other braconids in the subfamily Opiinae and ichneumonids in the subfamily Campopleginae independently acquired two other distinct nudiviruses that wasps have coopted to produce VLPs [10,11].The fourth type of DEV that has been identified are named ichnoviruses (IVs).IVs are present in two ichneumonid subfamilies (Campopleginae and Banchinae) which produce virions containing circular dsDNAs.Unlike the other three DEVs, IVs evolved from a still unknown ancestor.This ancestor was likely though a large DNA virus in the phylum Nucleocytoviricota, which was formerly referred to as nucleocytoplasmic large DNA viruses (NCLDVs) [12,13,14].Together, BVs and IVs were also earlier referred to as polydnaviruses, but this term is largely no longer used because of strong evidence indicating BVs and IVs derive from different ancestors [6].
BVs have been more studied than IVs but the latter are intriguing because of their uncertain origins.Despite differences in ancestry and gene content, BV and IV genomes are similarly organized into two components that have distinct functions [15].Insights into the genome components of IVs primarily derive from sequencing two campoplegine wasps named Hyposoter didymator and Campoletis sonorensis [16], along with calyx transcriptome studies [12,13,17,18] and proteomic analyses of purified virions [12,13].The first genome component of IVs are domains in the wasp genome that show evidence of deriving from the virus ancestor and having essential functions in virion formation.These domains, named "Ichnovirus Structural Protein Encoding Regions" (IVSPERs), contain intronless genes that are specifically transcribed in calyx cells [12,13,18].Most IVSPER genes are transcribed at the onset of pupation in hyaline stage 1 pupae [17], and some genes in IVSPERs encode proteins associated with IV virions [12,13].Six genes have been knocked down by RNA interference (RNAi) in H. didymator which demonstrated that they have functions in virion assembly or cell trafficking [17].Five IVSPERs have been identified in the H. didymator and C. sonorensis genomes [16], while three have been identified in the genome of the more distantly related banchine Glypta fumiferanae [13].The content of IVSPER genes is notably similar between ichneumonid wasp species [12,13,18], and their gene order is well-conserved among campoplegine species [15].Additionally, one intronless gene (U37) was identified in the H. didymator and C. sonorensis genomes outside of any IVSPER with features suggesting it also derives from the IV viral ancestor [16].Together, these genes, whether found within or outside IVSPERs, represent the fingerprints of the ancestral viral machinery essential for virion production and are designated as IV core replication genes.Notably, none of these genes are packaged in virions, indicating that IV core genes can only be transmitted vertically through the germline of associated parasitoids.
The second component of IV genomes are domains referred to as "proviral segments," which are amplified in calyx cells and produce circular, double-stranded (ds) DNAs that are packaged into capsids [19,20].The number of proviral segments, typically exceeding 50, are widely dispersed in wasp genomes and exhibit considerable variability between wasp species [16].Each proviral segment is characterized by flanking direct repeats (DRs) of variable length (<100 bp to >1 kb) and homology that identifies where homologous recombination processes occur to produce circularized, dsDNAs [19,20].Some IV proviral segments also contain internal repeats that facilitate additional homologous recombination events and produce multiple overlapping or nested circularized DNAs per proviral segment [16,19].The circularized dsDNAs produced from proviral segments encode genes with and without introns that are expressed in the hosts of wasps after virion infection [21,22,23,24].These genes also have functions that are required for the successful development of parasitoid progeny.Thus, while IV core replication genes represent the conserved viral machinery that produces virions in calyx cells, proviral segments contain the IV genome components virions transfer to parasitized hosts.
The replication of IVs, encompassing the processes leading to the production of virions containing IV segments, occurs within the nuclei of calyx cells during pupal and adult developmental stages [7,25].Electron microscopy studies of H. didymator ichnovirus (HdIV) show that fusiform-shaped capsids are individually enveloped in the nuclei of calyx cells during the late pupal stage (pigmented pupae, stage 3) [17].These enveloped "subvirions" exit the nucleus, traverse the cytoplasm, and exit calyx cells by budding, resulting in mature virions with two envelopes that accumulate in the calyx lumen of the ovaries [7,25].Earlier findings indicated that IVSPERs and proviral segments undergo amplification in newly emerged adult wasps [12].However, these data focused on only a subset of IVSPER genes and one proviral segment, leaving our knowledge of whether all IV genome components are amplified in calyx cells incomplete.Similarly, the initiation time of amplification during pupal development and IV virion production remains unknown.The specific role of IV core genes in virion production is also poorly documented when compared to BVs [26,27].The limited sequence homology of IVSPER genes with genes in other viruses provides minimal insights into potential functions.To date, only the six genes mentioned above that are involved in subvirion assembly or cell trafficking have been studied [17].
In this work, we explored IV DNA replication using the campoplegine wasp H. didymator.We first generated a chromosome-level assembly for the H. didymator genome.Through this assembly, we determined that all genome components undergo local amplification in calyx cells which initiates between pupal stages 1 and 2. Notably, IVSPERs, isolated IV core genes, and proviral segments were amplified in large regions with non-discrete boundaries.Next, we studied the function of U16 which is located on H. didymator IVSPER-3.U16 is one of the most transcribed IVSPER genes during the initial pupal stage and contains a weakly conserved domain found in the C-terminus of primases.RNAi knockdown of U16 inhibited virion formation.Knockdown also significantly reduced DNA amplification of all HdIV genome components, which decreased transcript abundance of IV core genes and the abundance of circular dsDNA viral molecules.We conclude that U16 is an essential gene for amplification of the HdIV genome and virion production, demonstrating that genes from the IV ancestor regulating IV replication have been conserved during virus domestication.Additionally, our results show that viral DNA amplification is essential for IV virion production.

Genomic localization of Hyposoter didymator IV components in a novel chromosome-level assembly
The genome assembly for H. didymator we previously generated [16] consisted of 2,591 scaffolds with an N50 of 4 Mbp.We concluded that this assembly was overly fragmented to evaluate DNA amplification in calyx cells during virion morphogenesis.We therefore used proximity ligation technology to produce a new chromosome-level assembly consisting of twelve large scaffolds that correspond with the haploid karyotype for H. didymator [28].The sizes of these scaffolds ranged from 6.7 Mbp to 29.

Mbp (Fig A and Table A in S1 Dataset).
The twelve scaffolds contained all HdIV loci previously identified: the replication genes organized in clusters (IVSPER-1 to IVSPER-5), the predicted IV replication gene (U37) located outside of an IVSPER, and 53 of the 54 previously identified proviral segments (Hd1 to Hd54) [16] (Fig 1).Only the scaffold containing Hd51 was missing from the 12 main scaffolds, possibly due to low-quality sequencing data in this region (Table A in S1 Dataset).Our chromosome-level assembly revealed that each scaffold contained at least one HdIV locus, but notably, all IVSPERs and 40% of the proviral segment loci resided on two scaffolds (7 and 11) (Fig 1).The high density of HdIV loci observed on scaffold 11 might indicate that this genomic region served as the integration site for the ancestral virus.
While three IVSPERs and the majority of proviral segments were distantly located from each other in the H. didymator genome, there were exceptions to this pattern including certain pairs of proviral segments separated by less than 20 kb (e.g., Hd36 and Hd38; Hd46 and Hd43; Hd44.1 and Hd44.2;Hd12 and Hd16 in scaffolds -1, -2 and -11 respectively).In all of these cases, the paired segments exhibited significant homology which suggested they derived from recent duplication events (Fig B in S1 Dataset).Additionally, several proviral segments were in proximity to IVSPERs or IV replication genes that resided outside of IVSPERs (e.g., Hd46

Amplification of Hyposoter didymator IV genome components in calyx cells during wasp pupal development
Prior results suggested IV proviral segment loci are amplified before processing and excision of individual segments that are packaged into capsids [19,20].An earlier study also reported that select IVSPER genes and one proviral segment located near an IVSPER are amplified in H. didymator adults [12].However, these findings did not address whether all IV genome components are amplified in calyx cells or when amplification initiates in association with virion formation.To more precisely investigate the temporal dynamics of amplification, we first conducted relative quantitative (q) PCR assays that measured copy number of genes in IVSPER-1, -2, and -3 in calyx DNA samples that were collected from stage 1-4 pupae.We compared these treatments to DNA samples from hind legs of stage 1 pupae where no HdIV replication occurs.We also included a wasp gene (XRCC1) located in close proximity to IVSPER-1.Results showed that the copy number of each tested gene was similar in calyx and hind legs in stage 1 pupae (statistical significances in We next investigated whether all or only some HdIV genome components are amplified by isolating DNA from calyx cells from stage 1 pupae, which the preceding qPCR assays detected no amplification, and stage 3 pupae, which showed high levels of amplication.We then generated paired-end libraries, which were sequenced using the Illumina platform, followed by read alignment to the new chromosome-level genome assembly.When analyzing the reads from stage 1 pupae, read depth per HdIV locus did not differ significantly from read depth of randomly selected regions of the same size from the rest of the wasp genome (Fig 2B).In contrast, for stage 3 pupae, read depth of the HdIV loci was significantly higher than that of randomly selected regions from the rest of the wasp genome (Fig 2A, read depth values in S1 Table ), suggesting most or all HdIV loci are amplified in pupal stage 3.

Differential levels of amplification across all components of the HdIV genome
While the preceding results suggested all HdIV genome components were amplified in calyx cells, the qPCR results presented in Fig 2A also indicated amplification levels varied with genes in IVSPER-3 exhibiting higher levels of amplification than genes in IVSPER-1 and -2.We therefore also analyzed read depth per position and the ratio of read depths between stage 3 and stage 1 in our Illumina data set (Figs 3 and S1).This analysis indicated that amplification levels of IVSPER loci, determined at the summit of the read depth curve, ranged from 10X for IVSPER-5 in Scaffold-7 to over 200X for IVSPER-3 in Scaffold-3 (S1 Table ).Consistent with the findings from our qPCR analysis, IVSPER-3 was more highly amplified than IVSPER-1 and -2.Results from the read depth per position curve further indicated that the peak of amplification occurs toward the middle of each IVSPER (S1 Fig), consistent with our qPCR data for PLOS PATHOGENS Proviral segment loci were relatively more amplified than IV replication gene loci, but were also variable (Figs 3 and S1).For example, the ratio of read depth between stage 3 and stage 1 ranged from 30X for proviral locus Hd40 in Scaffold-6 to over 1,100X for Hd27 in Scaffold-7 (S1 Table ) at the summit of the read depth curves.Variability in sequencing depth among proviral loci was consistent with earlier studies indicating that the circularized DNAs packaged into IV capsids are non-equimolar in abundance [8,30].
Cross Complementing 1) encoding gene is located 1,200 bp from U1 (position 3,270,470 to 3,272,519 in Scaffold-11).Data correspond to gDNA amplification relative to amplification of the housekeeping gene elongation factor 1 (ELF1).The Y-axis was transformed using the square root function for better data visualization.Statistical test results are available in Table B in S2 Dataset.(B) Read depth of HdIV loci compared to the rest of the wasp genome.Read depth values per analyzed region (see Materials and methods) are presented for each locus type (proviral segments and IVSPERs) at pupal stage 1 (hyaline pupa) and pupal stage 3 (pigmented pupa).The read depths per HdIV locus (HdIV) are compared to the read depth per random genome regions outside of HdIV loci (wasp).Note that the read depth value for random wasp regions is lower for DNA samples collected from stage 3 versus stage 1 pupae.This difference is attributed to the higher proportion of reads mapping to HdIV regions among the total number of reads in stage 3 compared to stage 1.The significance levels are indicated as follows: ns = non-significant, **p<0.01,and ***p<0.001.Statistical test results are available at https://github.com/flegeai/EVE_amplification/blob/main/notebooks/Count_tables.ipynb.https://doi.org/10.1371/journal.ppat.1011980.g002Two plateaus corresponded to reads mapping to the predicted locations of Hd11-1 (235X) and Hd11-2 (111X), while the central region with higher read depth (311X) corresponded to reads mapping to both nested segments (Fig 4A).This differential read depth would not be expected if reads mapped only to Hd11 chromosomal DNA.Consequently, the pattern of proviral segment amplification suggested part of the read depth values were due to reads arising from amplification intermediates and/or circularized dsDNAs that were also present in our DNA samples.Some amplified HdIV loci contain both an IVSPER and proviral segments.Two of these loci resided on Scaffold-11 (Hd29, IVSPER-2, Hd24, and Hd33, Hd15, IVSPER-1 (Fig 4B )).For these loci, the amplification curves spanned the length of the amplified region (yellow dotted line in Fig 4B ) but were interrupted by peaks corresponding to the length of proviral segments.This pattern suggested amplification levels of the chromosomal form of the proviral segments could correspond to levels comparable to the surrounding region and to the IVSPER amplification curves, but were higher because of reads also mapping to circular dsDNAs or amplification intermediates.

H. didymator IV genome components are amplified in wasp genome domains with undefined boundaries
Since our read depth data indicated amplified regions were larger than the annotated HdIV loci (Figs 3 and S1), we used the MACS2 peak calling program, originally developed for chromatin immunoprecipitation sequencing experiments, to identify areas in the H. didymator genome that were enriched for reads when compared to a control [32].Amplification peaks were called with MACS2 using alignments from stage 3 pupae as the treatment and alignments from stage 1 pupae as the control.MACS2 identified a total of 165 regions that were enriched for reads (i.e. were amplified) (S2 Table ).All HdIV genome components that we had annotated in our earlier study [16] were included in these regions.Among the other regions identified by MACS2, manual curation (see Materials and methods section) indicated three were proviral segment loci that we named Hd52, Hd53, and Hd54.Five others contained intronless genes with features suggesting origins from the IV ancestor that were outside of IVSPERs.We thus named these potential IV replication genes U38, U39, U40, U41, and U42.The remaining regions detected by MACS2 either contained predicted wasp genes or lacked any features that identified them as IV replication genes or proviral segments.Altogether, the MACS2 algorithm predicted a total of 55 regions in the H. didymator genome containing HdIV loci.Two proviral segments (Hd45.1 on Scaffold-4 and Hd2-like on Scaffold-7) escaped MACS2 detection, possibly because they were located too close to the ends of each scaffold.However, our read mapping data clearly indicated these two segments are amplified in stage 3 (Table 1) with a profile similar to other proviral segments (S1 Fig) .In total, read mapping and MACS2 peaks indicated the H. didymator genome contains 67 HdIV loci (56 proviral segments, five Table 1.HdIV loci amplified in calyx cells from stage 3 pupae identified by read mapping and/or the MACS2 algorithm.For each scaffold, the position and size of the HdIV loci are indicated.Loci newly identified in the present work are marked with asterisks.Corresponding amplified regions (i.e., the peak predicted by the MACS2 algorithm) are provided for each locus or groups of loci.Start and end positions delimiting the HdIV loci and the amplified regions detected by MACS2 are indicated.The distance between the start or the end of the amplified region and the locus is presented.For each HdIV locus and amplified region detected by MACS2, read depth values are provided for calyx cell samples collected from stage 1 or stage 3 pupae.Read depth is based on the length of the HdIV locus or the amplified region.ND indicates amplified regions not detected by MACS2.IVSPERs, and six predicted IV replication genes that reside outside of IVSPERs) that are all amplified in calyx cells at pupal stage 3 and embedded in a total of 57 large and locally amplified regions (Fig 3 and Table 1).

Annotated proviral locus
Our results also indicated all amplified regions in the H. didymator genome containing HdIV loci consist of the annotated HdIV locus along with flanking wasp sequences consistent with our detailed analysis of the wasp gene XRCC1 that is located in close proximity to IVSPER-1 (Fig 2B).Across all HdIV loci, we determined that the flanking regions containing wasp sequences that were amplified varied from 7,000 to 15,000 bp (Table 1).The total size of the amplified regions ranged from 10,692 bp (Hd28 on Scaffold-12) to 54,005 bp (IVSPER-2  ), two other amplified regions also contained both IVSPERs and proviral segments (U37, Hd46, and Hd43 on Scaffold-2; U40 and Hd39 on Scaffold-9).The simultaneous amplification of both types of HdIV loci within the same amplicons suggested regulation by shared molecular mechanisms.In turn, we further hypothesized similar molecular mechanisms also regulate the amplification of isolated proviral segments and IVSPERs.Lastly, we searched for sequence signatures that potentially identify the amplification boundaries for each HdIV locus.However, our analysis identified only low-complexity A-tract sequences, which were not specific to HdIV components as they were also found in random non-amplified wasp genomic sequences (S3 Fig) .Thus, no motifs were identified that distinguished the amplification boundaries of HdIV loci.

RNAi knockdown of U16 inhibits virion morphogenesis
We selected the gene U16 located on H. didymator IVSPER-3 as a factor with potential functions in activating IV replication.U16 is conserved among all IV-producing wasps for which genome or transcriptome data are available (Fig 5).In H. didymator calyx cells, U16 is also one of the most transcribed IV genes detected in calyx cells from stage 1 pupae [16].Sequence analysis using the basic local alignment search tool and DeepLoc2.0predicted all U16 family members contain a C-terminal alpha-helical domain (PriCT-2) of unknown function that is present in several primases [33] and a nuclear localization signal (Fig 5 and S3 Dataset).We next assessed the effects of knocking down U16 by RNAi on virion morphogenesis in calyx cells.We injected newly pupated wasps with dsRNAs that specifically targeted U16 using previously established methods [17].RT-qPCR analysis indicated transcript abundance in the calyx of newly emerged adult females was reduced by more than 90% when compared to control wasps that were injected with dsGFP (Fig 6A).Inspection of the ovaries further indicated that the calyx lumen of control wasps contained blue 'calyx fluid' indicative of HdIV virions being present, whereas almost no calyx fluid was seen in dsU16-injected wasps (Fig 6A ).Examination of calyx cell nuclei by transmission electron microscopy similarly showed that calyx cells in one-day-old control females contained an abundance of subvirions, whereas no subvirions were observed in treatment wasps (Fig 6B).We thus concluded that U16 is required for virion morphogenesis.

RNAi knockdown of U16 also disables amplification of HdIV loci
Since U16 contained a domain found in primases, we investigated whether RNAi knockdown also disabled amplification of HdIV genome components.We injected newly pupated wasps with dsU16 or dsGFP, followed by isolation and deep sequencing of calyx cell DNA from stage 3 pupae in three independent replicates.Mapping the reads from dsGFP-treated calyx samples to the H. didymator genome indicated all HdIV loci were amplified as evidenced by higher read depth values when compared to random regions of the wasp genome (Fig 7A).Silencing U16 significantly decreased read depth values when compared to dsGFP samples for both IVSPER and proviral segment loci (Fig 7A).However, read depth values remained higher for HdIV loci compared to other regions of the wasp genome in dsU16-treated samples (Fig 7A ), suggesting that dsU16 treatment did not completely abolish U16 expression across the three replicates.Upon individual analysis of each HdIV locus, we observed that read depth for every We extended our analysis by injecting dsGFP or dsU16 into newly formed pupae followed by isolation of DNA from calyx cells and hind legs where no HdIV replication occurs.We then used specific primers and qPCR assays that measured DNA abundance of three wasp genes, selected HdIV replication genes inside and outside of IVSPERs, and selected HdIV genes in different proviral segments.As anticipated, no genes were amplified in hind legs from either control or treatment wasps (   [12], Campoletis sonorensis [16], and Bathyplectes anurus [18], and in two banchine wasps Glypta fumiferanae [13] and Lissonota sp.[34].(A) Phylogenetic tree of U16 proteins inferred using a maximumlikelihood method.Branch supports are labeled near the nodes.

Impact of DNA amplification on IV replication gene transcription levels and abundance of circularized HdIV molecules in calyx cells
We hypothesized that amplification of IV replication genes would increase transcript abundance which in turn would be affected by inhibiting HdIV DNA amplification.We thus compared transcript abundance of various genes in IVSPER-1, -2, and -3, in calyx RNA samples that were collected from wasps treated with dsU16 or dsGFP.U16 knockdown reduced the expression of every HdIV replication gene we examined (Fig 9A and Table F in S2 Dataset).
Finally, we investigated the impact of U16 knockdown on the abundance of the circularized dsDNAs that are processed from amplified proviral segments.For this assay, we used PCR primers that specifically amplified the proviral form, circularized (episomal) or concatemeric form or both forms of Hd29 (Fig 9B and Table H in S2 Dataset).Results showed a significant reduction in both the proviral and circularized/concatemeric forms of Hd29 in calyx cell DNA from wasps injected with dsU16 when compared to DNA from wasps injected with dsGFP (Fig 9B and Table H in S2 Dataset).Our results thus indicated U16 is required for proviral segment amplification which is also required for production of circularized segments.

Discussion
This study significantly advances our understanding of IV replication by generating a chromosome-level assembly for the H. didymator genome, presenting several lines of evidence on the timing of HdIV loci amplification in calyx cells, and identifying U16 as an essential gene for amplification of all HdIV loci and virion formation.Read mapping to genomic DNA extracted from H. didymator pupal stages 1 and 3 further shows that all HdIV genome components are concurrently amplified in calyx cells while our qPCR data indicate amplification begins between pupal stage 1 and stage 2 and reaches its maximum in stage 4.This temporal pattern is overall similar to what has previously been shown to occur for BVs [27,35].In the braconid wasp Chelonus inanitus, DNA content of calyx cells also increases through endoreduplication [36], which we have not yet examined in H. didymator but is a future goal.Overall, our results indicate that DNA amplification of HdIV genome components in calyx cells is an essential feature of virion morphogenesis in H. didymator.
A second important advance from this study is our finding that HdIV loci are amplified with non-discrete boundaries that extend variable distances into flanking wasp DNA.This result differs from other integrated viruses such as polyomavirus where DNA undergoes "onion skin" replication, which is distinguished by replication forks that terminate at discrete boundaries [37].In contrast, IVSPER amplification does share similarities with chorion genes in Drosophila follicle cells that are amplified in regions that extend ~100 kb beyond the genes themselves without discrete termination boundaries [38,39].Similar to IVSPERs, levels of HdIV loci and the scaffold (Scaf-) they are located are indicated.(C) Read depth values for proviral segment loci in the three biological replicates of the dsU16 and dsGFP samples.For better visualization, only the scaffold (Scaf-) on which the proviral segments are located is indicated.The list of the proviral segment loci within each scaffold is available in Table 1.The y-axis was transformed by the log function for better data visualization.In (A), significance levels are indicated as follows: **p<0.01,and ***p<0.001.In (B) and (C), all differences between dsU16 and dsGFP samples are statistically significant at p<0.05.Statistical test results are available at https://github.com/flegeai/EVE_amplification/blob/main/notebooks/Count_tables.ipynb.https://doi.org/10.1371/journal.ppat.1011980.g007DNA amplification in Drosophila follicle cells also vary between amplicons [39,40].Amplification of chorion gene loci is associated with multiple origins of replication (ORs) that are interspersed within each gene cluster.This results in overlapping bidirectional replication forks progressing outward on either side of the ORs [40].Thus, IVSPERs and IV replication genes may also contain multiple ORs but additional approaches, such as nascent strand sequencing based on λ-exonuclease enrichment [41], will be necessary to determine if this is the case.
Our study also highlights that amplification of proviral segment loci is associated with a significant increase in read depth at the Direct Repeat (DR) positions bordering the proviral segments, which serve as sites for homologous recombination and circularization of the segments.This suggests that a portion of the rapid increase in read depth is due to reads mapping to amplification intermediates and circularized segments.The presence of concatemeric intermediates and circularized HdIV DNAs in the genomic DNA samples we sequenced is supported by our qPCR results for segment Hd29 (Fig 9B).Quantifying the proportion of reads mapping to the chromosomal form of HdIV segments was not possible in this study because our paired-end read data cannot discriminate between chromosomal HdIV DNA, potential replication intermediates, or circularized DNA.Long-read data as generated for some BVs [42] will be required to identify HdIV replication intermediates.Nonetheless, our results overall suggest HdIV proviral segment amplification involves both local chromosomal amplification and amplification of intermediates related to producing the circular dsDNAs that are packaged into capsids.
Our interest in U16 stemmed from previous results indicating it is transcriptionally upregulated in calyx cells before the appearance of envelope and capsid components [17].Sequence analysis during this study revealed a PriCT-2 domain in U16, known from primases in herpesviruses, whose function is unknown but may facilitate the association of the large primase domain (AEP) with DNA [33,43].Although other known primase domains were not identified in the U16 sequence, the presence of a PriCT-2 domain suggested this protein might play a role in the replication of HdIV genome components.Additionally, our RNAi experiments demonstrate that U16 knockdown resulted in the complete absence of virion production in calyx cell nuclei and calyx fluid.These observations indicated an essential role for U16 in the early stages of viral replication, potentially involved in the amplification of HdIV genome components and/or the transcriptional regulation of IV replication genes.Subsequently, we analyzed the genome-wide impact of RNAi knockdown of U16 on HdIV loci amplification, revealing that this gene is crucial for the amplification of all H. didymator IV genome components.In the case of IV replication genes, reduced amplification was accompanied by a simultaneous significant reduction in transcript abundance, likely resulting in insufficient amounts of HdIV structural proteins.However, amplification and transcription abundance levels did not fully correlate with each other.For instance, U11 and IVSP3-1 (both located on IVSPER-2) exhibit similar amplification patterns (Fig 2), but earlier findings showed that transcript abundances were not the same in calyx cells [16].Thus, differences in gene expression observed among genes located within the same amplified regions (Fig 2 ) could also be affected by promoter strength or other factors.On the other hand, inhibition of proviral segment loci amplification had consequences for the abundance of the circularized dsDNA that is packaged into capsids, which were drastically reduced.Thus, our results identify U16 as an essential protein for virion morphogenesis.However, its precise role in viral replication remains to be understood.Questions to be addressed in the future include whether U16 acts at the initiation or elongation step of HdIV DNA replication, whether it interacts directly with DNA, or with proteins from the replisome complex, which itself could be composed of a mixture of HdIV and wasp proteins.
BVs share some features with IVs but also exhibit differences.Notably, in contrast to IVs, where most core genes with functions in virion morphogenesis reside in IVSPERs, many BV core replication genes are widely dispersed in the genomes of wasps [44,45,46] and are not amplified in calyx cells during virion morphogenesis [41].However, some BV-producing wasp genomes contain a ~400 Kb DNA domain housing several nudiviral core genes, known as the nudivirus-like cluster, potentially marking the integration site of the nudivirus ancestor of BVs in the common ancestor of microgastroid braconids [9].Notably, the nudivirus-like cluster is amplified with non-discrete boundaries [42], similar to what is reported for IV genome components in this study.The observed similarity in the amplification pattern between the BV nudivirus cluster and the proviral components of IVs could suggest they are amplified through a common mechanism, even though the specific genes involved differ.
BV genomes also contain proviral segment loci with boundaries defined by flanking DRs and amplified in regions that include flanking regions outside of each DR.However, unlike IV proviral segments, the amplified flanking regions in BVs contain very precise nucleotide junctions that identify the boundaries of amplification [42,47].It is also known that some BV proviral segments are amplified as head-to-tail concatemers, consistent with a rolling circle amplification mechanism, while others are amplified as head-to-head and tail-to-tail concatemers, suggesting amplification by different mechanisms.However, all of these concatemers are similarly processed into circular DNAs by recombination at a precise site within DRs, which is a tetramer conserved in all BV segments [42,47].Nudiviral genes encoding tyrosine recombinases are further known to mediate this homologous recombination event [26,27].These types of molecules could also be present in IV genomes and need to be discovered.Currently, a detailed comparison between BV and IV proviral segment amplification is challenging and will require more information about the machinery involved in the processing of IV proviral segments into circular dsDNAs that are packaged into capsids.
Collectively, our results identify U16 as a gene deriving from the IV ancestor that is required for HdIV DNA replication.This indicates that viral regulatory factors required for DNA amplification other than U16 have been preserved in parasitoid genomes, which may also interact with wasp cellular machinery in regulating viral replication.Furthermore, this work emphasizes the value of studying original endogenized viruses, such as those found in parasitoids, to unveil new regulators of DNA processing.

Dovetail Omni-C Library preparation and sequencing
DNA from 10 male offspring (i.e., haploid genomes) from a single female H. didymator was sent on dry ice to Dovetail Genomics for Omni-C library construction.In the process of constructing the Dovetail Omni-C library, chromatin was fixed in place within the nucleus using formaldehyde and subsequently extracted.The fixed chromatin was digested with DNAse I followed by repair of chromatin ends and ligation to a biotinylated bridge adapter.Proximity ligation of adapter-containing ends ensued.Post-proximity ligation, crosslinks were reversed, and the DNA was purified.The purified DNA underwent treatment to eliminate biotin not internal to ligated fragments.Sequencing libraries were generated utilizing NEBNext Ultra enzymes and Illumina-compatible adapters.Fragments containing biotin were isolated using streptavidin beads before PCR enrichment of each library.The library was sequenced using the Illumina HiSeqX platform, which generated approximately 30x read depth.Subsequently, HiRise utilized reads with a mapping quality greater than 50 (MQ>50) for scaffolding purposes.

Scaffolding the assembly with HiRise
The de novo assembly from [16], and the Dovetail OmniC library reads served as input data for HiRise, a specialized software pipeline designed for leveraging proximity ligation data to scaffold genome assemblies, as outlined by [49].The sequences from the Dovetail OmniC library were aligned to the initial draft assembly using the bwa tool (available at https://github.com/lh3/bwa).HiRise then analyzed the separations of Dovetail OmniC read pairs mapped within the draft scaffolds.This analysis generated a likelihood model for the genomic distance between read pairs.The model was subsequently employed to identify and rectify putative misjoins, score potential joins, and execute joins above a specified threshold.A contact map was generated from a BAM file by utilizing read pairs where both ends were aligned with a mapping quality of 60.

Genomic DNA (gDNA) extraction for high throughput sequencing
Comparative analysis of two pupal stages.Genomic DNA (gDNA) was extracted from pooled calyx samples dissected from H. didymator female pupae at stage 1 (~60 females) and stage 3 (~50 females).Since the aim was to compare the two developmental pupal stages, a single replicate was done for each stage.Impact of U16 knockdown.Genomic DNA from calyces was collected from stage 3 female pupae that were injected with dsGFP and dsU16.This experiment involved three biological replicates, each corresponding to 30 to 50 calyx samples.Genomic DNA was extracted using the phenol-chloroform method.Briefly, calyx samples were incubated in proteinase K (Ambion, 0.5 μg/μl) and Sarkosyl detergent (Sigma, 20%), followed by treatment with RNAse (Promega, 0.3 μg/μl).Total genomic DNA was then extracted through phenol-chloroform extraction and ethanol precipitation.Following extraction, gDNA was quantified using a QBIT fluorometer (ThermoFisher) and subsequently sent for sequencing to Genewiz/Azenta company.Paired-end sequencing was carried out using Illumina technology and NovaSeq 2x150bp platform.

NGS data analyses
Illumina reads were aligned to the updated version of the H. didymator genome using bwa mem [50], version 0.7.17, with default parameters.Subsequently, the aligned reads were converted to BAM files utilizing samtools view (version 1.15) [51].
Prediction of the amplified regions.Amplification peaks were identified using MACS2 [31] by comparing the pupal stage 3 alignment file as treatment and the pupal stage 1 alignment file as control.The specified parameters for this analysis were:-broad-nomodel -g 1.8e8 -q 0.01-min-length 5000.Out of the 165 predicted peaks (i.e., amplified regions), only those with a fold change (FC) higher than 2 were retained for further analyses, resulting in a total of 59 peaks.These 59 peaks encompassed all known proviral loci, except for Hd40, which had a slightly lower value than the specified threshold (FC = 1.9), and Hd45.1 and Hd2-like, located too close to the scaffold end and potentially missed.For the predicted peaks with FC>2 that did not correspond to known proviral loci, a manual curation was performed to determine whether these regions corresponded to HdIV loci.Proviral segments were identified by their flanking direct repeats (DRs) and gene contents, specifically the presence of genes belonging to IV segment conserved gene families.To identify putative core IV replication genes, genes present in the MACS2 peak were analyzed.Only those with no similarity to wasp proteins and that were transcribed in calyx cells (based on the available transcriptome from [17]) were retained.
Comparison of read depths between HdIV loci and the rest of the wasp genome.Read depths were computed for various types of virus-related genomic regions, including each locus (IVSPERs, IV replication genes outside IVSPERs, proviral segments), and each MACS2-detected amplified region.To calculate read depth for non-viral regions of the wasp genome, one hundred sets of random regions mimicking the size distribution of HdIV loci were generated using the shuffle tool from bedtools version 2.27 [52].This was achieved by utilizing the bed file of HdIV loci (56 for proviral segments and 11 for IVSPERs) as parameters for the shuffle tool.Raw read counts were determined for each proviral and wasp genomic region using fea-tureCounts [53] from the Subread package (version 2.0.1) with the parameters (-c -P -s 0 -O).Subsequently, read depth values were computed for each pupal stage (stage 1, St1 and stage 3, St3), as well as for each experiment (dsGFP and dsU16) and each replicate with a custom script available at https://github.com/flegeai/EVE_amplification.Read depth values for each region were calculated by dividing the number of fragments mapped to the region by the size of the region (expressed in kilobase pairs, kbp), and further normalized by the depth of the library (expressed in million reads).
Genome read depths per position on H. didymator scaffolds (Counts per Million, CPM) and Maximal value of amplification per proviral locus.Genome read depths per position in 10 bp bins were acquired using the BamCoverage tool from the deeptools package [54] with the options:-normalizeUsing CPM and -bs 10.Subsequently, for each 10 bp bin, the pupal stage 3 (St3) versus stage 1 (St1) ratio was computed through an in-house script available at https://github.com/flegeai/EVE_amplification.This script utilized the pyBigWig python library from deeptools [54].To determine the maximal counts per million (CPM) at each stage for every proviral locus, an in-house script importing the pyBigWig python library was employed.The maximum CPM value for the "stage 3 / stage 1" ratio was then calculated based on the 10 bp bin bigwig file, specifically for the position displaying the highest CPM value at stage 3 (summit).
Search for motifs at the HdIV amplified regions boundaries.The MEME suite [55] was employed for analyses using default parameters and a search for six motifs.A dataset comprising a total of 110 sequences, each spanning 1,000 nucleotides on both sides of the start and end positions of the 55 HdIV amplified regions predicted by the MACS2 algorithm, was utilized for this analysis.As a control, a parallel analysis was conducted using 110 sequences, each 2,000 nucleotides in length, randomly selected from locations within the H. didymator genome but outside the HdIV loci.This control dataset allowed for the comparison of motif patterns between the HdIV amplified regions and randomly chosen genomic regions.

Genomic DNA extraction for gDNA amplification analyses by quantitative real-time PCR
To assess the level of DNA amplification, total genomic DNA (gDNA) was extracted using the DNeasy Blood & Tissue Kit (Qiagen) following the manufacturer's protocol.Ovaries (ovarioles removed) and hind legs, representing the negative control, were dissected from ten pupae at four different stages.Three replicates were generated for each pupal stage.Quantification of target gene amplification was conducted through quantitative PCR, utilizing LightCycler 480 SYBR Green I Master Mix (Roche) in 384-well plates (Roche).The total reaction volume per well was 3 μl, comprising 1.75 μl of the reaction mix (1.49 μl SYBR Green I Master Mix, 0.1 μl nuclease-free water, and 0.16 μl diluted primer), and 1.25 μl of each gDNA sample diluted to achieve a concentration of 1.2 ng/μl.Primers used are listed in S4 Table .The gDNA levels corresponding to the viral genes and the housekeeping wasp gene (elongation factor (ELF-1)) were determined using the LightCycler 480 System (Roche).The cycling conditions involved heating at 95˚C for 10 min, followed by 45 cycles of 95˚C for 10 s, 58˚C for 10 s, and 72˚C for 10 s.Each sample was evaluated in triplicate.The obtained DNA levels were normalized with respect to the wasp gene ELF-1.Raw data are provided in S2 Dataset.

Total RNA extraction
Total RNA was extracted from ovaries (ovarioles removed) dissected from pupae at different stages using the Qiagen RNeasy extraction kit in accordance with the manufacturer's protocol.To control for gene silencing, total RNAs were also extracted from individual adult wasp abdomens (2 to 4 days old).For this, Trizol reagent (Ambion) was initially used followed by extraction using the NucleoSpin RNA kit (Macherey-Nagel).Isolated RNA was then subjected to DNase treatment using the TURBO DNA-free Kit (Life Technologies) to assure removal of any residual genomic DNA from the RNA samples.

Protein sequence analyses
Conserved domains of U16 were identified using the CD-search tool available through NCBI's conserved domain database resource [56,57].Subcellular localization predictions were made using the DeepLoc-2.0tool, a deep learning-based approach for predicting the subcellular localization of eukaryotic proteins [58].Protein multiple sequence alignment and phylogenetic tree were generated using MEGA11 software [59], using MUSCLE for protein alignment and Maximum Likelihood method to generate the phylogenetic tree.The alignment was visualized using Jalview [60].Structure predictions for U16 were carried out using the MPI Bioinformatics Toolkit [61].

RNA interference (RNAi)
Gene-specific double-stranded RNA (dsRNA) used for RNAi experiments was prepared using the T7 RiboMAX Express RNAi System (Promega).Initially, a 350-450 bp fragment corresponding to the U16 sequence was cloned into the double T7 vector L4440 (a gift from Andrew Fire, Addgene plasmid # 1654).Subsequently, an in vitro transcription template DNA was PCR amplified with a T7 primer, and this template was used to synthesize sense and antisense RNA strands with T7 RNA polymerase at 37˚C for 5 hours.The primers used for dsRNA production are listed in S4 Table .After annealing and DNase treatment using the TURBO DNAfree Kit (Life Technologies), the purified dsRNAs were resuspended in nuclease-free water, quantified using a NanoDrop ND-1000 Spectrophotometer (Thermo Scientific), and examined by agarose gel electrophoresis to ensure their integrity.Injections were performed in less than one-day-old female pupae using a microinjector (Fentojet Express, Eppendorf) and a micromanipulator (Narishige).Approximately 0.3-0.6 μl of 500 ng/μl dsRNA was injected into each individual.Control wasps were injected with a non-specific dsRNA homologous to the green fluorescent protein (GFP) gene.Treated pupae were kept in an incubator until adult emergence, which occurred approximately 5 days after injection.

Transmission electron microscopy
Ovaries were dissected from adult wasps between 2 and 3 days after emergence, following the procedures outlined in [17].To ensure consistency of the observed phenotype, at least three females (taken at different microinjection dates) were observed for each tested dsRNA.For transmission electron microscopy (TEM) observations, calyces were fixed in a solution of 2% glutaraldehyde in PBS for 2 hours and then post-fixed in 2% osmium tetroxide in the same buffer for 1 hour.Tissues were subsequently bulk-stained for 2 hours in a 5% aqueous uranyl acetate solution, dehydrated in ethanol, and embedded in EM812 resin (EMS).Ultrathin sections were double-stained with Uranyless (DeltaMicroscopy) and lead citrate before examination under a Jeol 1200 EXII electron microscope at 100 kV (MEA Platform, University of Montpellier).Images were captured with an EMSIS Olympus Quemesa 11 Megapixels camera and analyzed using ImageJ software [62].

Reverse-transcriptase quantitative real-time PCR (RT-qPCR)
For RT-qPCR assays, 400 ng of total RNA was reverse-transcribed using the SuperScript III Reverse Transcriptase kit (Life Technologies) and oligo(dT)15 primer (Promega).The mRNA transcript levels of selected IVSPER genes were measured by quantitative reverse transcription-PCR (qRT-PCR) using a LightCycler 480 System (Roche) and SYBR Green I Master Mix (Roche).Expression levels were normalized relative to a housekeeping wasp gene (elongation factor 1 ELF-1).Each sample was evaluated in triplicate, and the total reaction volume per well was 3 μl, including 0.5 μM of each primer and cDNA corresponding to 0.88 ng of total RNA.The amplification program consisted of an initial step at 95˚C for 10 min, followed by 45 cycles of 95˚C for 10 s, 58˚C for 10 s, and 72˚C for 10 s.The primers used for this analysis are listed in S4 Table.

qPCR data analysis
Data were acquired using Light-Cycler 480 software.PCR amplification efficiency (E) for each primer pair was determined by linear regression of a dilution series (5x) of the cDNA pool.Relative expression, using the housekeeping gene ELF-1 as a reference, was calculated through advanced relative quantification (Efficiency method) software provided by Light-Cycler 480 software.For statistical analyses, Levene's and Shapiro-Wilk tests were employed to verify homogeneity of variance and normal distribution of data among the tested groups.Differences in gene relative expression between developmental stages and between dsGFP and dsU16injected females were assessed using a two-tailed unpaired t-test for group comparison.In cases where homogeneity of variance was not assumed, a Welch-test was used to compare gene relative expression between groups.A p-value < 0.05 was considered significant.All statistical analyses were conducted using R [63].Detailed statistical analyses of qPCR results are provided in S2 Dataset.

Fig 1 .
Fig 1. Genome of H. didymator and position of HdIV loci.Position of the HdIV loci previously identified in [16] within the twelve large scaffolds obtained using Hi-C technology.The HdIV replication genes, organized in Ichnovirus Structural Protein Encoding Regions (IVSPERs) or that are isolated in other regions of the H. didymator genome, are highlighted in red while viral segments are indicated in black.The scaffold diagram was generated using Mapchart software [29].At the bottom of the figure, diagrams illustrate the difference between the two types of viral loci: the IVSPER which are clusters of HdIV replication genes inherited from the IV proviral ancestor, and the viral segments flanked by direct repeats located on the left (DRL) and right (DRR) sides of segments, housing virulence genes expressed in the parasitized insect.

Fig 3 .
Fig 3. HdIV DNA amplification.DNA amplification in pupal stage 3 was assessed by mapping genomic DNA Illumina reads against the 12 large H. didymator genome scaffolds.In each scaffold, red bars indicate amplified loci, with the intensity of red corresponding to increased values of the counts per million (CPM) ratio between pupal stage 3 and pupal stage 1.The positions of IVSPERs and isolated IV replication genes are indicated by purple squares, while proviral segments are indicated by green circles.For selected HdIV loci, amplification curves (representing the ratio of the CPM values calculated for 10 bp intervals between pupal stage 3 and pupal stage 1) are shown in boxes.Amplification curves for all of the annotated HdIV loci are shown in S1 Fig.Each HdIV locus is indicated in red while 10,000 bp of flanking sequence on each side of the locus is also shown.For proviral segments, loci are defined as the sequence delimited by two direct repeats; IVSPERs are defined as the region between the start and stop codon of the first and last coding sequences in the cluster; isolated IV replication genes are defined by their coding sequence.https://doi.org/10.1371/journal.ppat.1011980.g003

Fig 4 .
Fig 4. HdIV amplified regions in Scaffold-11.(A) Detail of the amplified region at the Hd11 locus.Hd11 (red bar) represents two overlapping segments Hd11-1 (defined by left and right DRs, DR1L and DR1R respectively, blue bar) and Hd11-2 (defined by DR2L and DR2R, orange bar).(B) Detail of two other amplified regions containing IVSPERs and HdIV proviral loci.In (B), amplification curves of IVSPERs are highlighted in yellow.In (A) and (B), amplification curves represent the ratio of the CPM values (calculated for 10 bp intervals) obtained in pupal stage 3 compared to pupal stage 1.For each locus, amplification values at the summit of the peaks (bold type) and at the start and end positions of HdIV segments are indicated with arrows.Each amplification curve figure was generated by Integrated Genome Viewer (IGV) [31].
Fig 8 and Table D in S2 Dataset).In dsGFP-injected control wasps, all HdIV genes were amplified in calyx cell samples (Fig 8 and Table D in S2 Dataset).Among the wasp genes, only XRCC1 exhibited significant amplification, consistent with its location within the IVSPER-1 amplified region (Fig 8 and Table D in S2 Dataset).In contrast,

Fig 7 .
Fig 7. Impact of U16 RNAi knockdown on DNA proviral amplification.(A) Comparative distribution of read depths in dsGFP-and dsU16-injected females.Read depth values in three biological replicates are given for HdIV loci (V) and random genomic regions outside of HdIV loci (W) with equal size distribution.IVSPERs and IV replication genes loci are shown in the left panel, while proviral segment loci are shown in the right panel.(B) Read depth values for IVSPERs and IV replication genes residing outside an IVSPER in three biological replicates treated with dsU16-or dsGFP.Names of

Fig 8 .Fig 9 .
Fig 8. Impact of U16 RNAi knockdown on amplification of selected wasp and HdIV genes.Relative genomic amplification of select HdIV genes in twoday-old females injected with dsGFP or dsU16.The wasp gene XRCC1, located within the amplified region of the IVSPER-1 locus, was incorporated into the analysis.Wasp histone (H1) and ribosomal protein (rpl) genes served as controls.Samples were obtained from calyx cells (where virion are produced) and hind legs (control).Statistical significance levels are denoted as follows: ns = non-significant, *p<0.05,**p<0.01,and ***p<0.001.Statistical test results are available in Table D in S2 Dataset.The y-axis values were transformed using the square root function for better data visualization.https://doi.org/10.1371/journal.ppat.1011980.g008 Table B in S2 Dataset, lower panel), indicating none were amplified during the initial pupal stage.Subsequently, the copy number of each gene increased progressively with each pupal stage (Fig 2A and Table B in S2 Dataset).While exhibiting lower amplification levels than the IVSPER genes we analyzed, a similar trend was observed for the wasp gene XRCC1 (Fig 2A and Table B in S2 Dataset).These findings indicated IVSPER amplification in calyx cells begins between pupal stage 1 (one day old, hyaline) and pupal stage 2, and further increases in pupal stage 3 (five days old, pigmented abdomen) and pupal stage 4.

Table 1 .
(Continued)Most amplified regions contained a single HdIV locus, but seven contained a mix of HdIV genome components (Table1).Three amplified regions contained the neighboring and closely related proviral segments mentioned above (e.g., Hd36 and Hd38 on Scaffold-1, Hd44.1 and Hd44.2 on Scaffold-2, Hd12 and Hd16 on Scaffold-11).In addition to the two examples noted above on Scaffold 11 (see Fig 4B